← Início

Overview

Brought to you by YData

Dataset statistics

Number of variables22
Number of observations3881
Missing cells25764
Missing cells (%)30.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.6 MiB
Average record size in memory985.1 B

Variable types

Numeric6
Text5
Categorical11

Alerts

Unnamed: 0 is highly overall correlated with sample_type and 2 other fieldsHigh correlation
metastatic_site is highly overall correlated with sample_type and 2 other fieldsHigh correlation
mitotic_rate is highly overall correlated with source and 1 other fieldsHigh correlation
os_status is highly overall correlated with sample_type and 2 other fieldsHigh correlation
sample_coverage is highly overall correlated with source and 1 other fieldsHigh correlation
sample_type is highly overall correlated with Unnamed: 0 and 4 other fieldsHigh correlation
source is highly overall correlated with Unnamed: 0 and 10 other fieldsHigh correlation
stage_at_diagnosis is highly overall correlated with source and 2 other fieldsHigh correlation
treatment is highly overall correlated with os_status and 3 other fieldsHigh correlation
treatment_response is highly overall correlated with Unnamed: 0 and 2 other fieldsHigh correlation
tumor_grade is highly overall correlated with metastatic_site and 5 other fieldsHigh correlation
tumor_purity is highly overall correlated with tumor_gradeHigh correlation
tumor_size is highly overall correlated with source and 1 other fieldsHigh correlation
treatment is highly imbalanced (62.1%) Imbalance
primary_site is highly imbalanced (56.1%) Imbalance
metastatic_site is highly imbalanced (54.9%) Imbalance
sample_id has 2522 (65.0%) missing values Missing
age_at_diagnosis has 182 (4.7%) missing values Missing
stage_at_diagnosis has 182 (4.7%) missing values Missing
tumor_size has 3267 (84.2%) missing values Missing
mitotic_rate has 3300 (85.0%) missing values Missing
treatment_response has 2541 (65.5%) missing values Missing
race has 637 (16.4%) missing values Missing
metastatic_site has 3085 (79.5%) missing values Missing
tumor_purity has 3037 (78.3%) missing values Missing
sample_coverage has 3085 (79.5%) missing values Missing
os_months has 710 (18.3%) missing values Missing
os_status has 666 (17.2%) missing values Missing
mutated_genes has 2545 (65.6%) missing values Missing
Unnamed: 0 is uniformly distributed Uniform
Unnamed: 0 has unique values Unique
mitotic_rate has 43 (1.1%) zeros Zeros

Reproduction

Analysis started2025-07-30 02:13:12.967521
Analysis finished2025-07-30 02:13:20.454857
Duration7.49 seconds
Software versionydata-profiling vv4.16.1
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ)

High correlation  Uniform  Unique 

Distinct3881
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1940
Minimum0
Maximum3880
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size30.4 KiB
2025-07-30T02:13:20.585144image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile194
Q1970
median1940
Q32910
95-th percentile3686
Maximum3880
Range3880
Interquartile range (IQR)1940

Descriptive statistics

Standard deviation1120.4925
Coefficient of variation (CV)0.57757347
Kurtosis-1.2
Mean1940
Median Absolute Deviation (MAD)970
Skewness0
Sum7529140
Variance1255503.5
MonotonicityStrictly increasing
2025-07-30T02:13:20.729500image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3880 1
 
< 0.1%
0 1
 
< 0.1%
1 1
 
< 0.1%
2 1
 
< 0.1%
3 1
 
< 0.1%
4 1
 
< 0.1%
5 1
 
< 0.1%
6 1
 
< 0.1%
7 1
 
< 0.1%
3864 1
 
< 0.1%
Other values (3871) 3871
99.7%
ValueCountFrequency (%)
0 1
< 0.1%
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
ValueCountFrequency (%)
3880 1
< 0.1%
3879 1
< 0.1%
3878 1
< 0.1%
3877 1
< 0.1%
3876 1
< 0.1%
3875 1
< 0.1%
3874 1
< 0.1%
3873 1
< 0.1%
3872 1
< 0.1%
3871 1
< 0.1%

sample_id
Text

Missing 

Distinct1160
Distinct (%)85.4%
Missing2522
Missing (%)65.0%
Memory size173.8 KiB
2025-07-30T02:13:20.985112image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length17
Median length17
Mean length14.465048
Min length10

Characters and Unicode

Total characters19658
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1093 ?
Unique (%)80.4%

Sample

1st rowCOSS1030183
2nd rowCOSS1030184
3rd rowCOSS1035469
4th rowCOSS1035470
5th rowCOSS1036012
ValueCountFrequency (%)
p-0001315-t02-im5 9
 
0.7%
p-0001315-t01-im3 9
 
0.7%
p-0002276-t01-im3 8
 
0.6%
p-0012178-t01-im5 7
 
0.5%
p-0002477-t01-im3 7
 
0.5%
p-0007513-t03-im5 6
 
0.4%
p-0004937-t01-im5 6
 
0.4%
p-0000501-t02-im3 6
 
0.4%
p-0005066-t01-im5 6
 
0.4%
p-0004937-t03-im6 6
 
0.4%
Other values (1150) 1289
94.8%
2025-07-30T02:13:21.531599image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 3258
16.6%
- 2388
12.1%
1 1871
 
9.5%
S 1126
 
5.7%
2 1054
 
5.4%
6 1053
 
5.4%
3 917
 
4.7%
5 914
 
4.6%
4 836
 
4.3%
P 796
 
4.0%
Other values (8) 5445
27.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 19658
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 3258
16.6%
- 2388
12.1%
1 1871
 
9.5%
S 1126
 
5.7%
2 1054
 
5.4%
6 1053
 
5.4%
3 917
 
4.7%
5 914
 
4.6%
4 836
 
4.3%
P 796
 
4.0%
Other values (8) 5445
27.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 19658
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 3258
16.6%
- 2388
12.1%
1 1871
 
9.5%
S 1126
 
5.7%
2 1054
 
5.4%
6 1053
 
5.4%
3 917
 
4.7%
5 914
 
4.6%
4 836
 
4.3%
P 796
 
4.0%
Other values (8) 5445
27.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 19658
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 3258
16.6%
- 2388
12.1%
1 1871
 
9.5%
S 1126
 
5.7%
2 1054
 
5.4%
6 1053
 
5.4%
3 917
 
4.7%
5 914
 
4.6%
4 836
 
4.3%
P 796
 
4.0%
Other values (8) 5445
27.7%
Distinct3301
Distinct (%)85.1%
Missing0
Missing (%)0.0%
Memory size247.6 KiB
2025-07-30T02:13:21.791106image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length36
Median length9
Mean length8.2826591
Min length3

Characters and Unicode

Total characters32145
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3018 ?
Unique (%)77.8%

Sample

1st row924209
2nd row924209
3rd row929361
4th row929361
5th row929884
ValueCountFrequency (%)
p-0001315 18
 
0.5%
p-0004937 12
 
0.3%
1464316 9
 
0.2%
p-0000134 8
 
0.2%
p-0004760 8
 
0.2%
p-0008084 8
 
0.2%
p-0002276 8
 
0.2%
p-0002594 8
 
0.2%
p-0006104 8
 
0.2%
p-0001157 8
 
0.2%
Other values (3291) 3786
97.6%
2025-07-30T02:13:22.157665image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 4363
13.6%
1 4248
13.2%
2 3564
11.1%
6 2881
9.0%
3 2647
8.2%
4 2621
8.2%
9 2593
8.1%
5 2450
7.6%
8 2018
6.3%
7 1997
6.2%
Other values (8) 2763
8.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 32145
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 4363
13.6%
1 4248
13.2%
2 3564
11.1%
6 2881
9.0%
3 2647
8.2%
4 2621
8.2%
9 2593
8.1%
5 2450
7.6%
8 2018
6.3%
7 1997
6.2%
Other values (8) 2763
8.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 32145
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 4363
13.6%
1 4248
13.2%
2 3564
11.1%
6 2881
9.0%
3 2647
8.2%
4 2621
8.2%
9 2593
8.1%
5 2450
7.6%
8 2018
6.3%
7 1997
6.2%
Other values (8) 2763
8.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 32145
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 4363
13.6%
1 4248
13.2%
2 3564
11.1%
6 2881
9.0%
3 2647
8.2%
4 2621
8.2%
9 2593
8.1%
5 2450
7.6%
8 2018
6.3%
7 1997
6.2%
Other values (8) 2763
8.6%

age_at_diagnosis
Real number (ℝ)

Missing 

Distinct78
Distinct (%)2.1%
Missing182
Missing (%)4.7%
Infinite0
Infinite (%)0.0%
Mean61.985401
Minimum7
Maximum90
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.4 KiB
2025-07-30T02:13:22.287904image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile37
Q153
median63
Q372
95-th percentile84
Maximum90
Range83
Interquartile range (IQR)19

Descriptive statistics

Standard deviation13.981321
Coefficient of variation (CV)0.22555829
Kurtosis0.031991203
Mean61.985401
Median Absolute Deviation (MAD)9
Skewness-0.43409614
Sum229284
Variance195.47734
MonotonicityNot monotonic
2025-07-30T02:13:22.415795image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
66 123
 
3.2%
67 120
 
3.1%
60 115
 
3.0%
65 112
 
2.9%
71 110
 
2.8%
64 107
 
2.8%
72 106
 
2.7%
70 106
 
2.7%
59 105
 
2.7%
69 101
 
2.6%
Other values (68) 2594
66.8%
(Missing) 182
 
4.7%
ValueCountFrequency (%)
7 1
 
< 0.1%
11 1
 
< 0.1%
12 2
 
0.1%
14 1
 
< 0.1%
17 1
 
< 0.1%
18 2
 
0.1%
19 5
0.1%
20 1
 
< 0.1%
21 1
 
< 0.1%
22 5
0.1%
ValueCountFrequency (%)
90 43
1.1%
89 11
 
0.3%
88 20
0.5%
87 22
0.6%
86 25
0.6%
85 34
0.9%
84 39
1.0%
83 38
1.0%
82 29
0.7%
81 40
1.0%

stage_at_diagnosis
Categorical

High correlation  Missing 

Distinct5
Distinct (%)0.1%
Missing182
Missing (%)4.7%
Memory size247.5 KiB
Localized
1381 
Unknown
1355 
Metastatic
517 
Regional
374 
metastasis
 
72

Length

Max length10
Median length9
Mean length8.3254934
Min length7

Characters and Unicode

Total characters30796
Distinct characters19
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnknown
2nd rowUnknown
3rd rowUnknown
4th rowUnknown
5th rowUnknown

Common Values

ValueCountFrequency (%)
Localized 1381
35.6%
Unknown 1355
34.9%
Metastatic 517
 
13.3%
Regional 374
 
9.6%
metastasis 72
 
1.9%
(Missing) 182
 
4.7%

Length

2025-07-30T02:13:22.548968image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-30T02:13:22.666844image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
localized 1381
37.3%
unknown 1355
36.6%
metastatic 517
 
14.0%
regional 374
 
10.1%
metastasis 72
 
1.9%

Most occurring characters

ValueCountFrequency (%)
n 4439
14.4%
o 3110
10.1%
a 2933
 
9.5%
e 2344
 
7.6%
i 2344
 
7.6%
c 1898
 
6.2%
l 1755
 
5.7%
t 1695
 
5.5%
L 1381
 
4.5%
d 1381
 
4.5%
Other values (9) 7516
24.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 30796
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 4439
14.4%
o 3110
10.1%
a 2933
 
9.5%
e 2344
 
7.6%
i 2344
 
7.6%
c 1898
 
6.2%
l 1755
 
5.7%
t 1695
 
5.5%
L 1381
 
4.5%
d 1381
 
4.5%
Other values (9) 7516
24.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 30796
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 4439
14.4%
o 3110
10.1%
a 2933
 
9.5%
e 2344
 
7.6%
i 2344
 
7.6%
c 1898
 
6.2%
l 1755
 
5.7%
t 1695
 
5.5%
L 1381
 
4.5%
d 1381
 
4.5%
Other values (9) 7516
24.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 30796
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 4439
14.4%
o 3110
10.1%
a 2933
 
9.5%
e 2344
 
7.6%
i 2344
 
7.6%
c 1898
 
6.2%
l 1755
 
5.7%
t 1695
 
5.5%
L 1381
 
4.5%
d 1381
 
4.5%
Other values (9) 7516
24.4%

tumor_size
Real number (ℝ)

High correlation  Missing 

Distinct145
Distinct (%)23.6%
Missing3267
Missing (%)84.2%
Infinite0
Infinite (%)0.0%
Mean9.5754072
Minimum0.7
Maximum42
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.4 KiB
2025-07-30T02:13:22.789364image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0.7
5-th percentile2.4
Q15.125
median8.5
Q313
95-th percentile20
Maximum42
Range41.3
Interquartile range (IQR)7.875

Descriptive statistics

Standard deviation5.8640107
Coefficient of variation (CV)0.61240327
Kurtosis2.4200107
Mean9.5754072
Median Absolute Deviation (MAD)3.5
Skewness1.2060716
Sum5879.3
Variance34.386621
MonotonicityNot monotonic
2025-07-30T02:13:22.930217image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11 26
 
0.7%
15 26
 
0.7%
8 25
 
0.6%
10 20
 
0.5%
12 19
 
0.5%
6.5 18
 
0.5%
7 16
 
0.4%
14 15
 
0.4%
6 12
 
0.3%
5 11
 
0.3%
Other values (135) 426
 
11.0%
(Missing) 3267
84.2%
ValueCountFrequency (%)
0.7 1
 
< 0.1%
1 1
 
< 0.1%
1.2 1
 
< 0.1%
1.4 1
 
< 0.1%
1.5 2
 
0.1%
1.7 3
0.1%
1.8 3
0.1%
1.9 2
 
0.1%
2 6
0.2%
2.1 2
 
0.1%
ValueCountFrequency (%)
42 1
 
< 0.1%
36.5 1
 
< 0.1%
34 1
 
< 0.1%
29.5 1
 
< 0.1%
27.7 1
 
< 0.1%
26 1
 
< 0.1%
25 8
0.2%
24 9
0.2%
21 5
0.1%
20.4 1
 
< 0.1%

mitotic_rate
Real number (ℝ)

High correlation  Missing  Zeros 

Distinct63
Distinct (%)10.8%
Missing3300
Missing (%)85.0%
Infinite0
Infinite (%)0.0%
Mean17.58864
Minimum0
Maximum175
Zeros43
Zeros (%)1.1%
Negative0
Negative (%)0.0%
Memory size30.4 KiB
2025-07-30T02:13:23.063940image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median7
Q325
95-th percentile55
Maximum175
Range175
Interquartile range (IQR)23

Descriptive statistics

Standard deviation23.43541
Coefficient of variation (CV)1.3324174
Kurtosis8.0323429
Mean17.58864
Median Absolute Deviation (MAD)6
Skewness2.4506504
Sum10219
Variance549.21842
MonotonicityNot monotonic
2025-07-30T02:13:23.201046image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2 60
 
1.5%
1 55
 
1.4%
0 43
 
1.1%
5 38
 
1.0%
10 29
 
0.7%
4 27
 
0.7%
50 27
 
0.7%
3 23
 
0.6%
20 23
 
0.6%
7 23
 
0.6%
Other values (53) 233
 
6.0%
(Missing) 3300
85.0%
ValueCountFrequency (%)
0 43
1.1%
1 55
1.4%
2 60
1.5%
3 23
 
0.6%
4 27
0.7%
5 38
1.0%
6 22
 
0.6%
7 23
 
0.6%
8 15
 
0.4%
9 5
 
0.1%
ValueCountFrequency (%)
175 1
 
< 0.1%
145 1
 
< 0.1%
130 1
 
< 0.1%
125 1
 
< 0.1%
112 1
 
< 0.1%
104 1
 
< 0.1%
102 1
 
< 0.1%
100 4
0.1%
95 1
 
< 0.1%
90 7
0.2%

treatment
Categorical

High correlation  Imbalance 

Distinct28
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size243.8 KiB
SURGERY
2429 
IMATINIB
621 
OTHER
483 
UNKNOWN
 
90
SUNITINIB
 
65
Other values (23)
 
193

Length

Max length31
Median length7
Mean length7.2821438
Min length4

Characters and Unicode

Total characters28262
Distinct characters27
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)0.2%

Sample

1st rowIMATINIB
2nd rowIMATINIB
3rd rowIMATINIB
4th rowIMATINIB
5th rowIMATINIB

Common Values

ValueCountFrequency (%)
SURGERY 2429
62.6%
IMATINIB 621
 
16.0%
OTHER 483
 
12.4%
UNKNOWN 90
 
2.3%
SUNITINIB 65
 
1.7%
CLINICAL_TRIAL 50
 
1.3%
IMATINIB + SUNITINIB 39
 
1.0%
REGORAFENIB 32
 
0.8%
SORAFENIB 15
 
0.4%
PAZOPANIB 13
 
0.3%
Other values (18) 44
 
1.1%

Length

2025-07-30T02:13:23.337818image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
surgery 2429
61.0%
imatinib 669
 
16.8%
other 483
 
12.1%
sunitinib 105
 
2.6%
unknown 97
 
2.4%
51
 
1.3%
clinical_trial 50
 
1.3%
regorafenib 32
 
0.8%
sorafenib 17
 
0.4%
pazopanib 14
 
0.4%
Other values (14) 36
 
0.9%

Most occurring characters

ValueCountFrequency (%)
R 5500
19.5%
E 3023
10.7%
U 2642
9.3%
I 2603
9.2%
S 2558
9.1%
G 2461
8.7%
Y 2435
8.6%
T 1347
 
4.8%
N 1334
 
4.7%
A 871
 
3.1%
Other values (17) 3488
12.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 28262
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
R 5500
19.5%
E 3023
10.7%
U 2642
9.3%
I 2603
9.2%
S 2558
9.1%
G 2461
8.7%
Y 2435
8.6%
T 1347
 
4.8%
N 1334
 
4.7%
A 871
 
3.1%
Other values (17) 3488
12.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 28262
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
R 5500
19.5%
E 3023
10.7%
U 2642
9.3%
I 2603
9.2%
S 2558
9.1%
G 2461
8.7%
Y 2435
8.6%
T 1347
 
4.8%
N 1334
 
4.7%
A 871
 
3.1%
Other values (17) 3488
12.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 28262
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
R 5500
19.5%
E 3023
10.7%
U 2642
9.3%
I 2603
9.2%
S 2558
9.1%
G 2461
8.7%
Y 2435
8.6%
T 1347
 
4.8%
N 1334
 
4.7%
A 871
 
3.1%
Other values (17) 3488
12.3%

treatment_response
Categorical

High correlation  Missing 

Distinct6
Distinct (%)0.4%
Missing2541
Missing (%)65.5%
Memory size238.7 KiB
UNKNOWN
525 
NR
494 
PR
133 
CR
83 
SD
63 

Length

Max length7
Median length2
Mean length3.9589552
Min length2

Characters and Unicode

Total characters5305
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPR
2nd rowPR
3rd rowNR
4th rowPR
5th rowCR

Common Values

ValueCountFrequency (%)
UNKNOWN 525
 
13.5%
NR 494
 
12.7%
PR 133
 
3.4%
CR 83
 
2.1%
SD 63
 
1.6%
NE 42
 
1.1%
(Missing) 2541
65.5%

Length

2025-07-30T02:13:23.438421image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-30T02:13:23.521414image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
unknown 525
39.2%
nr 494
36.9%
pr 133
 
9.9%
cr 83
 
6.2%
sd 63
 
4.7%
ne 42
 
3.1%

Most occurring characters

ValueCountFrequency (%)
N 2111
39.8%
R 710
 
13.4%
U 525
 
9.9%
K 525
 
9.9%
O 525
 
9.9%
W 525
 
9.9%
P 133
 
2.5%
C 83
 
1.6%
S 63
 
1.2%
D 63
 
1.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 5305
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
N 2111
39.8%
R 710
 
13.4%
U 525
 
9.9%
K 525
 
9.9%
O 525
 
9.9%
W 525
 
9.9%
P 133
 
2.5%
C 83
 
1.6%
S 63
 
1.2%
D 63
 
1.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 5305
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
N 2111
39.8%
R 710
 
13.4%
U 525
 
9.9%
K 525
 
9.9%
O 525
 
9.9%
W 525
 
9.9%
P 133
 
2.5%
C 83
 
1.6%
S 63
 
1.2%
D 63
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 5305
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
N 2111
39.8%
R 710
 
13.4%
U 525
 
9.9%
K 525
 
9.9%
O 525
 
9.9%
W 525
 
9.9%
P 133
 
2.5%
C 83
 
1.6%
S 63
 
1.2%
D 63
 
1.2%

primary_site
Categorical

Imbalance 

Distinct24
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size259.0 KiB
Stomach
2153 
Small Intestine
1071 
Soft Tissue
 
101
Colon And Rectum (Excluding Appendix)
 
98
Abdomen/Intraabdominal
 
88
Other values (19)
370 

Length

Max length37
Median length7
Mean length11.311002
Min length4

Characters and Unicode

Total characters43898
Distinct characters45
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)0.1%

Sample

1st rowStomach
2nd rowStomach
3rd rowSmall Intestine
4th rowSmall Intestine
5th rowSmall Intestine

Common Values

ValueCountFrequency (%)
Stomach 2153
55.5%
Small Intestine 1071
27.6%
Soft Tissue 101
 
2.6%
Colon And Rectum (Excluding Appendix) 98
 
2.5%
Abdomen/Intraabdominal 88
 
2.3%
Digestive Other 76
 
2.0%
Colon/Rectum 63
 
1.6%
GI Tract (Indeterminate) 61
 
1.6%
Retroperitoneum 55
 
1.4%
Retroperitoneum And Peritoneum 29
 
0.7%
Other values (14) 86
 
2.2%

Length

2025-07-30T02:13:23.657245image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
stomach 2153
37.6%
small 1071
18.7%
intestine 1071
18.7%
and 129
 
2.3%
appendix 103
 
1.8%
tissue 101
 
1.8%
soft 101
 
1.8%
colon 98
 
1.7%
rectum 98
 
1.7%
excluding 98
 
1.7%
Other values (29) 701
 
12.2%

Most occurring characters

ValueCountFrequency (%)
t 5214
11.9%
m 3749
 
8.5%
a 3713
 
8.5%
e 3393
 
7.7%
S 3325
 
7.6%
n 3193
 
7.3%
o 3024
 
6.9%
l 2542
 
5.8%
c 2499
 
5.7%
h 2258
 
5.1%
Other values (35) 10988
25.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 43898
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 5214
11.9%
m 3749
 
8.5%
a 3713
 
8.5%
e 3393
 
7.7%
S 3325
 
7.6%
n 3193
 
7.3%
o 3024
 
6.9%
l 2542
 
5.8%
c 2499
 
5.7%
h 2258
 
5.1%
Other values (35) 10988
25.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 43898
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 5214
11.9%
m 3749
 
8.5%
a 3713
 
8.5%
e 3393
 
7.7%
S 3325
 
7.6%
n 3193
 
7.3%
o 3024
 
6.9%
l 2542
 
5.8%
c 2499
 
5.7%
h 2258
 
5.1%
Other values (35) 10988
25.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 43898
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 5214
11.9%
m 3749
 
8.5%
a 3713
 
8.5%
e 3393
 
7.7%
S 3325
 
7.6%
n 3193
 
7.3%
o 3024
 
6.9%
l 2542
 
5.8%
c 2499
 
5.7%
h 2258
 
5.1%
Other values (35) 10988
25.0%

sample_type
Categorical

High correlation 

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size245.9 KiB
Unknown
2532 
Metastasis
659 
Primary
546 
Local Recurrence
 
144

Length

Max length16
Median length7
Mean length7.8433393
Min length7

Characters and Unicode

Total characters30440
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPrimary
2nd rowPrimary
3rd rowMetastasis
4th rowMetastasis
5th rowUnknown

Common Values

ValueCountFrequency (%)
Unknown 2532
65.2%
Metastasis 659
 
17.0%
Primary 546
 
14.1%
Local Recurrence 144
 
3.7%

Length

2025-07-30T02:13:23.763340image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-30T02:13:23.843668image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
unknown 2532
62.9%
metastasis 659
 
16.4%
primary 546
 
13.6%
local 144
 
3.6%
recurrence 144
 
3.6%

Most occurring characters

ValueCountFrequency (%)
n 7740
25.4%
o 2676
 
8.8%
U 2532
 
8.3%
k 2532
 
8.3%
w 2532
 
8.3%
a 2008
 
6.6%
s 1977
 
6.5%
r 1380
 
4.5%
t 1318
 
4.3%
i 1205
 
4.0%
Other values (11) 4540
14.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 30440
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 7740
25.4%
o 2676
 
8.8%
U 2532
 
8.3%
k 2532
 
8.3%
w 2532
 
8.3%
a 2008
 
6.6%
s 1977
 
6.5%
r 1380
 
4.5%
t 1318
 
4.3%
i 1205
 
4.0%
Other values (11) 4540
14.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 30440
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 7740
25.4%
o 2676
 
8.8%
U 2532
 
8.3%
k 2532
 
8.3%
w 2532
 
8.3%
a 2008
 
6.6%
s 1977
 
6.5%
r 1380
 
4.5%
t 1318
 
4.3%
i 1205
 
4.0%
Other values (11) 4540
14.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 30440
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 7740
25.4%
o 2676
 
8.8%
U 2532
 
8.3%
k 2532
 
8.3%
w 2532
 
8.3%
a 2008
 
6.6%
s 1977
 
6.5%
r 1380
 
4.5%
t 1318
 
4.3%
i 1205
 
4.0%
Other values (11) 4540
14.9%

race
Categorical

Missing 

Distinct9
Distinct (%)0.3%
Missing637
Missing (%)16.4%
Memory size261.0 KiB
White
2067 
Black
465 
Other (American Indian/AK Native, Asian/Pacific Islander)
443 
Black or African American
 
99
Unknown
 
96
Other values (4)
 
74

Length

Max length57
Median length5
Mean length12.78021
Min length5

Characters and Unicode

Total characters41459
Distinct characters31
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowWhite
2nd rowWhite
3rd rowWhite
4th rowWhite
5th rowWhite

Common Values

ValueCountFrequency (%)
White 2067
53.3%
Black 465
 
12.0%
Other (American Indian/AK Native, Asian/Pacific Islander) 443
 
11.4%
Black or African American 99
 
2.6%
Unknown 96
 
2.5%
Asian 55
 
1.4%
Other 15
 
0.4%
Not Provided 3
 
0.1%
Native American 1
 
< 0.1%
(Missing) 637
 
16.4%

Length

2025-07-30T02:13:23.949625image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-30T02:13:24.052382image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
white 2067
35.9%
black 564
 
9.8%
american 543
 
9.4%
other 458
 
8.0%
native 444
 
7.7%
indian/ak 443
 
7.7%
asian/pacific 443
 
7.7%
islander 443
 
7.7%
or 99
 
1.7%
african 99
 
1.7%
Other values (4) 157
 
2.7%

Most occurring characters

ValueCountFrequency (%)
i 4983
 
12.0%
e 3958
 
9.5%
a 3477
 
8.4%
t 2972
 
7.2%
n 2757
 
6.6%
h 2525
 
6.1%
2516
 
6.1%
c 2092
 
5.0%
W 2067
 
5.0%
r 1645
 
4.0%
Other values (21) 12467
30.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 41459
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 4983
 
12.0%
e 3958
 
9.5%
a 3477
 
8.4%
t 2972
 
7.2%
n 2757
 
6.6%
h 2525
 
6.1%
2516
 
6.1%
c 2092
 
5.0%
W 2067
 
5.0%
r 1645
 
4.0%
Other values (21) 12467
30.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 41459
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 4983
 
12.0%
e 3958
 
9.5%
a 3477
 
8.4%
t 2972
 
7.2%
n 2757
 
6.6%
h 2525
 
6.1%
2516
 
6.1%
c 2092
 
5.0%
W 2067
 
5.0%
r 1645
 
4.0%
Other values (21) 12467
30.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 41459
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 4983
 
12.0%
e 3958
 
9.5%
a 3477
 
8.4%
t 2972
 
7.2%
n 2757
 
6.6%
h 2525
 
6.1%
2516
 
6.1%
c 2092
 
5.0%
W 2067
 
5.0%
r 1645
 
4.0%
Other values (21) 12467
30.1%

gender
Categorical

Distinct2
Distinct (%)0.1%
Missing5
Missing (%)0.1%
Memory size234.9 KiB
Male
2041 
Female
1835 

Length

Max length6
Median length4
Mean length4.9468524
Min length4

Characters and Unicode

Total characters19174
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowFemale
4th rowFemale
5th rowFemale

Common Values

ValueCountFrequency (%)
Male 2041
52.6%
Female 1835
47.3%
(Missing) 5
 
0.1%

Length

2025-07-30T02:13:24.194603image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-30T02:13:24.263059image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
male 2041
52.7%
female 1835
47.3%

Most occurring characters

ValueCountFrequency (%)
e 5711
29.8%
a 3876
20.2%
l 3876
20.2%
M 2041
 
10.6%
F 1835
 
9.6%
m 1835
 
9.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 19174
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 5711
29.8%
a 3876
20.2%
l 3876
20.2%
M 2041
 
10.6%
F 1835
 
9.6%
m 1835
 
9.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 19174
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 5711
29.8%
a 3876
20.2%
l 3876
20.2%
M 2041
 
10.6%
F 1835
 
9.6%
m 1835
 
9.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 19174
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 5711
29.8%
a 3876
20.2%
l 3876
20.2%
M 2041
 
10.6%
F 1835
 
9.6%
m 1835
 
9.6%

metastatic_site
Categorical

High correlation  Imbalance  Missing 

Distinct32
Distinct (%)4.0%
Missing3085
Missing (%)79.5%
Memory size245.9 KiB
Not Applicable
483 
Liver
150 
Mesentery
 
15
Peritoneum
 
15
Abdomen
 
13
Other values (27)
120 

Length

Max length18
Median length14
Mean length11.182161
Min length4

Characters and Unicode

Total characters8901
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)1.0%

Sample

1st rowNot Applicable
2nd rowNot Applicable
3rd rowNot Applicable
4th rowNot Applicable
5th rowLiver

Common Values

ValueCountFrequency (%)
Not Applicable 483
 
12.4%
Liver 150
 
3.9%
Mesentery 15
 
0.4%
Peritoneum 15
 
0.4%
Abdomen 13
 
0.3%
Pelvis 13
 
0.3%
Small Bowel 13
 
0.3%
Omentum 11
 
0.3%
Spleen 9
 
0.2%
Skin 9
 
0.2%
Other values (22) 65
 
1.7%
(Missing) 3085
79.5%

Length

2025-07-30T02:13:24.360958image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
not 486
36.6%
applicable 483
36.3%
liver 150
 
11.3%
mesentery 15
 
1.1%
peritoneum 15
 
1.1%
pelvis 15
 
1.1%
abdomen 13
 
1.0%
small 13
 
1.0%
bowel 13
 
1.0%
omentum 11
 
0.8%
Other values (29) 115
 
8.7%

Most occurring characters

ValueCountFrequency (%)
l 1079
12.1%
p 987
11.1%
e 824
 
9.3%
i 702
 
7.9%
o 572
 
6.4%
a 552
 
6.2%
t 546
 
6.1%
533
 
6.0%
b 507
 
5.7%
A 504
 
5.7%
Other values (27) 2095
23.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 8901
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
l 1079
12.1%
p 987
11.1%
e 824
 
9.3%
i 702
 
7.9%
o 572
 
6.4%
a 552
 
6.2%
t 546
 
6.1%
533
 
6.0%
b 507
 
5.7%
A 504
 
5.7%
Other values (27) 2095
23.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 8901
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
l 1079
12.1%
p 987
11.1%
e 824
 
9.3%
i 702
 
7.9%
o 572
 
6.4%
a 552
 
6.2%
t 546
 
6.1%
533
 
6.0%
b 507
 
5.7%
A 504
 
5.7%
Other values (27) 2095
23.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 8901
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
l 1079
12.1%
p 987
11.1%
e 824
 
9.3%
i 702
 
7.9%
o 572
 
6.4%
a 552
 
6.2%
t 546
 
6.1%
533
 
6.0%
b 507
 
5.7%
A 504
 
5.7%
Other values (27) 2095
23.5%

tumor_purity
Real number (ℝ)

High correlation  Missing 

Distinct14
Distinct (%)1.7%
Missing3037
Missing (%)78.3%
Infinite0
Infinite (%)0.0%
Mean66.156398
Minimum10
Maximum90
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.4 KiB
2025-07-30T02:13:24.452118image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile30
Q160
median70
Q380
95-th percentile90
Maximum90
Range80
Interquartile range (IQR)20

Descriptive statistics

Standard deviation18.460607
Coefficient of variation (CV)0.27904492
Kurtosis0.20032513
Mean66.156398
Median Absolute Deviation (MAD)10
Skewness-0.85498788
Sum55836
Variance340.79402
MonotonicityNot monotonic
2025-07-30T02:13:24.553147image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
80 221
 
5.7%
70 168
 
4.3%
60 150
 
3.9%
90 104
 
2.7%
50 71
 
1.8%
40 50
 
1.3%
30 44
 
1.1%
20 13
 
0.3%
10 9
 
0.2%
85 6
 
0.2%
Other values (4) 8
 
0.2%
(Missing) 3037
78.3%
ValueCountFrequency (%)
10 9
 
0.2%
15 2
 
0.1%
20 13
 
0.3%
30 44
 
1.1%
35 4
 
0.1%
40 50
 
1.3%
50 71
1.8%
60 150
3.9%
63 1
 
< 0.1%
70 168
4.3%
ValueCountFrequency (%)
90 104
2.7%
85 6
 
0.2%
80 221
5.7%
73 1
 
< 0.1%
70 168
4.3%
63 1
 
< 0.1%
60 150
3.9%
50 71
 
1.8%
40 50
 
1.3%
35 4
 
0.1%

sample_coverage
Real number (ℝ)

High correlation  Missing 

Distinct406
Distinct (%)51.0%
Missing3085
Missing (%)79.5%
Infinite0
Infinite (%)0.0%
Mean670.40704
Minimum106
Maximum1270
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.4 KiB
2025-07-30T02:13:24.699306image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum106
5-th percentile330.5
Q1520
median672.5
Q3812
95-th percentile1023
Maximum1270
Range1164
Interquartile range (IQR)292

Descriptive statistics

Standard deviation211.57931
Coefficient of variation (CV)0.31559828
Kurtosis-0.24543072
Mean670.40704
Median Absolute Deviation (MAD)146.5
Skewness0.029028888
Sum533644
Variance44765.804
MonotonicityNot monotonic
2025-07-30T02:13:24.845340image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1132 10
 
0.3%
674 9
 
0.2%
1023 9
 
0.2%
920 7
 
0.2%
583 7
 
0.2%
808 7
 
0.2%
682 7
 
0.2%
780 7
 
0.2%
434 7
 
0.2%
677 7
 
0.2%
Other values (396) 719
 
18.5%
(Missing) 3085
79.5%
ValueCountFrequency (%)
106 2
0.1%
148 1
 
< 0.1%
152 1
 
< 0.1%
172 2
0.1%
176 2
0.1%
182 4
0.1%
184 1
 
< 0.1%
189 1
 
< 0.1%
205 1
 
< 0.1%
206 1
 
< 0.1%
ValueCountFrequency (%)
1270 1
 
< 0.1%
1243 1
 
< 0.1%
1225 1
 
< 0.1%
1152 1
 
< 0.1%
1135 1
 
< 0.1%
1132 10
0.3%
1108 1
 
< 0.1%
1107 2
 
0.1%
1085 3
 
0.1%
1080 1
 
< 0.1%

os_months
Text

Missing 

Distinct539
Distinct (%)17.0%
Missing710
Missing (%)18.3%
Memory size212.5 KiB
2025-07-30T02:13:25.159870image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length7
Median length4
Mean length4.4203721
Min length3

Characters and Unicode

Total characters14017
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique367 ?
Unique (%)11.6%

Sample

1st row11.079
2nd row11.079
3rd row11.079
4th row11.079
5th row11.079
ValueCountFrequency (%)
0000 126
 
4.0%
0001 89
 
2.8%
0003 85
 
2.7%
0004 80
 
2.5%
0006 78
 
2.5%
0002 77
 
2.4%
0005 76
 
2.4%
0009 76
 
2.4%
0010 74
 
2.3%
0019 73
 
2.3%
Other values (529) 2337
73.7%
2025-07-30T02:13:25.580598image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 6226
44.4%
1 1377
 
9.8%
2 1001
 
7.1%
4 844
 
6.0%
3 808
 
5.8%
. 742
 
5.3%
5 684
 
4.9%
7 608
 
4.3%
6 580
 
4.1%
9 572
 
4.1%
Other values (6) 575
 
4.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 14017
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 6226
44.4%
1 1377
 
9.8%
2 1001
 
7.1%
4 844
 
6.0%
3 808
 
5.8%
. 742
 
5.3%
5 684
 
4.9%
7 608
 
4.3%
6 580
 
4.1%
9 572
 
4.1%
Other values (6) 575
 
4.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 14017
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 6226
44.4%
1 1377
 
9.8%
2 1001
 
7.1%
4 844
 
6.0%
3 808
 
5.8%
. 742
 
5.3%
5 684
 
4.9%
7 608
 
4.3%
6 580
 
4.1%
9 572
 
4.1%
Other values (6) 575
 
4.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 14017
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 6226
44.4%
1 1377
 
9.8%
2 1001
 
7.1%
4 844
 
6.0%
3 808
 
5.8%
. 742
 
5.3%
5 684
 
4.9%
7 608
 
4.3%
6 580
 
4.1%
9 572
 
4.1%
Other values (6) 575
 
4.1%
Distinct237
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Memory size261.0 KiB
2025-07-30T02:13:25.981383image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length12
Median length12
Mean length11.832002
Min length10

Characters and Unicode

Total characters45920
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique168 ?
Unique (%)4.3%

Sample

1st rowMISSING_DATE
2nd rowMISSING_DATE
3rd rowMISSING_DATE
4th rowMISSING_DATE
5th rowMISSING_DATE
ValueCountFrequency (%)
missing_date 3555
91.6%
1899-12-30 8
 
0.2%
1900-01-18 5
 
0.1%
1900-01-13 4
 
0.1%
1905-05-29 4
 
0.1%
1899-12-29 4
 
0.1%
1900-01-17 4
 
0.1%
1900-01-21 3
 
0.1%
1905-06-21 3
 
0.1%
1900-01-09 3
 
0.1%
Other values (227) 288
 
7.4%
2025-07-30T02:13:26.483493image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
I 7110
15.5%
S 7110
15.5%
M 3555
7.7%
N 3555
7.7%
G 3555
7.7%
_ 3555
7.7%
D 3555
7.7%
A 3555
7.7%
T 3555
7.7%
E 3555
7.7%
Other values (11) 3260
7.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 45920
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
I 7110
15.5%
S 7110
15.5%
M 3555
7.7%
N 3555
7.7%
G 3555
7.7%
_ 3555
7.7%
D 3555
7.7%
A 3555
7.7%
T 3555
7.7%
E 3555
7.7%
Other values (11) 3260
7.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 45920
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
I 7110
15.5%
S 7110
15.5%
M 3555
7.7%
N 3555
7.7%
G 3555
7.7%
_ 3555
7.7%
D 3555
7.7%
A 3555
7.7%
T 3555
7.7%
E 3555
7.7%
Other values (11) 3260
7.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 45920
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
I 7110
15.5%
S 7110
15.5%
M 3555
7.7%
N 3555
7.7%
G 3555
7.7%
_ 3555
7.7%
D 3555
7.7%
A 3555
7.7%
T 3555
7.7%
E 3555
7.7%
Other values (11) 3260
7.1%

os_status
Categorical

High correlation  Missing 

Distinct3
Distinct (%)0.1%
Missing666
Missing (%)17.2%
Memory size245.8 KiB
DECEASED
2569 
ALIVE
512 
DECEASED_NON_CANCER
 
134

Length

Max length19
Median length8
Mean length7.9807154
Min length5

Characters and Unicode

Total characters25658
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDECEASED
2nd rowDECEASED
3rd rowDECEASED
4th rowDECEASED
5th rowDECEASED

Common Values

ValueCountFrequency (%)
DECEASED 2569
66.2%
ALIVE 512
 
13.2%
DECEASED_NON_CANCER 134
 
3.5%
(Missing) 666
 
17.2%

Length

2025-07-30T02:13:26.607759image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-30T02:13:26.947096image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
deceased 2569
79.9%
alive 512
 
15.9%
deceased_non_cancer 134
 
4.2%

Most occurring characters

ValueCountFrequency (%)
E 8755
34.1%
D 5406
21.1%
A 3349
 
13.1%
C 2971
 
11.6%
S 2703
 
10.5%
L 512
 
2.0%
I 512
 
2.0%
V 512
 
2.0%
N 402
 
1.6%
_ 268
 
1.0%
Other values (2) 268
 
1.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 25658
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
E 8755
34.1%
D 5406
21.1%
A 3349
 
13.1%
C 2971
 
11.6%
S 2703
 
10.5%
L 512
 
2.0%
I 512
 
2.0%
V 512
 
2.0%
N 402
 
1.6%
_ 268
 
1.0%
Other values (2) 268
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 25658
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
E 8755
34.1%
D 5406
21.1%
A 3349
 
13.1%
C 2971
 
11.6%
S 2703
 
10.5%
L 512
 
2.0%
I 512
 
2.0%
V 512
 
2.0%
N 402
 
1.6%
_ 268
 
1.0%
Other values (2) 268
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 25658
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
E 8755
34.1%
D 5406
21.1%
A 3349
 
13.1%
C 2971
 
11.6%
S 2703
 
10.5%
L 512
 
2.0%
I 512
 
2.0%
V 512
 
2.0%
N 402
 
1.6%
_ 268
 
1.0%
Other values (2) 268
 
1.0%

source
Categorical

High correlation 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size237.0 KiB
SEER
2429 
CBioPortal
796 
COSMIC
563 
GDC
 
74
PDMR
 
19

Length

Max length10
Median length4
Mean length5.5016748
Min length3

Characters and Unicode

Total characters21352
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCOSMIC
2nd rowCOSMIC
3rd rowCOSMIC
4th rowCOSMIC
5th rowCOSMIC

Common Values

ValueCountFrequency (%)
SEER 2429
62.6%
CBioPortal 796
 
20.5%
COSMIC 563
 
14.5%
GDC 74
 
1.9%
PDMR 19
 
0.5%

Length

2025-07-30T02:13:27.049381image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-30T02:13:27.134709image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
seer 2429
62.6%
cbioportal 796
 
20.5%
cosmic 563
 
14.5%
gdc 74
 
1.9%
pdmr 19
 
0.5%

Most occurring characters

ValueCountFrequency (%)
E 4858
22.8%
S 2992
14.0%
R 2448
11.5%
C 1996
9.3%
o 1592
 
7.5%
P 815
 
3.8%
B 796
 
3.7%
i 796
 
3.7%
r 796
 
3.7%
t 796
 
3.7%
Other values (7) 3467
16.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 21352
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
E 4858
22.8%
S 2992
14.0%
R 2448
11.5%
C 1996
9.3%
o 1592
 
7.5%
P 815
 
3.8%
B 796
 
3.7%
i 796
 
3.7%
r 796
 
3.7%
t 796
 
3.7%
Other values (7) 3467
16.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 21352
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
E 4858
22.8%
S 2992
14.0%
R 2448
11.5%
C 1996
9.3%
o 1592
 
7.5%
P 815
 
3.8%
B 796
 
3.7%
i 796
 
3.7%
r 796
 
3.7%
t 796
 
3.7%
Other values (7) 3467
16.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 21352
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
E 4858
22.8%
S 2992
14.0%
R 2448
11.5%
C 1996
9.3%
o 1592
 
7.5%
P 815
 
3.8%
B 796
 
3.7%
i 796
 
3.7%
r 796
 
3.7%
t 796
 
3.7%
Other values (7) 3467
16.2%

tumor_grade
Categorical

High correlation 

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size250.8 KiB
Unknown
1688 
High grade
1672 
Low grade
277 
Intermediate grade
244 

Length

Max length18
Median length10
Mean length9.1267715
Min length7

Characters and Unicode

Total characters35421
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnknown
2nd rowUnknown
3rd rowUnknown
4th rowUnknown
5th rowUnknown

Common Values

ValueCountFrequency (%)
Unknown 1688
43.5%
High grade 1672
43.1%
Low grade 277
 
7.1%
Intermediate grade 244
 
6.3%

Length

2025-07-30T02:13:27.236394image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-30T02:13:27.306996image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
grade 2193
36.1%
unknown 1688
27.8%
high 1672
27.5%
low 277
 
4.6%
intermediate 244
 
4.0%

Most occurring characters

ValueCountFrequency (%)
n 5308
15.0%
g 3865
10.9%
e 2925
 
8.3%
d 2437
 
6.9%
r 2437
 
6.9%
a 2437
 
6.9%
2193
 
6.2%
o 1965
 
5.5%
w 1965
 
5.5%
i 1916
 
5.4%
Other values (8) 7973
22.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 35421
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 5308
15.0%
g 3865
10.9%
e 2925
 
8.3%
d 2437
 
6.9%
r 2437
 
6.9%
a 2437
 
6.9%
2193
 
6.2%
o 1965
 
5.5%
w 1965
 
5.5%
i 1916
 
5.4%
Other values (8) 7973
22.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 35421
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 5308
15.0%
g 3865
10.9%
e 2925
 
8.3%
d 2437
 
6.9%
r 2437
 
6.9%
a 2437
 
6.9%
2193
 
6.2%
o 1965
 
5.5%
w 1965
 
5.5%
i 1916
 
5.4%
Other values (8) 7973
22.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 35421
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 5308
15.0%
g 3865
10.9%
e 2925
 
8.3%
d 2437
 
6.9%
r 2437
 
6.9%
a 2437
 
6.9%
2193
 
6.2%
o 1965
 
5.5%
w 1965
 
5.5%
i 1916
 
5.4%
Other values (8) 7973
22.5%

mutated_genes
Text

Missing 

Distinct283
Distinct (%)21.2%
Missing2545
Missing (%)65.6%
Memory size170.7 KiB
2025-07-30T02:13:27.593042image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length144
Median length7
Mean length12.741766
Min length7

Characters and Unicode

Total characters17023
Distinct characters42
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique204 ?
Unique (%)15.3%

Sample

1st row['KIT']
2nd row['KIT']
3rd row['KIT']
4th row['KIT']
5th row['KIT']
ValueCountFrequency (%)
kit 1125
51.3%
pdgfra 91
 
4.2%
rb1 51
 
2.3%
tp53 45
 
2.1%
nf1 42
 
1.9%
max 41
 
1.9%
setd2 38
 
1.7%
mga 33
 
1.5%
braf 30
 
1.4%
pten 27
 
1.2%
Other values (221) 669
30.5%
2025-07-30T02:13:28.052810image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
' 4384
25.8%
T 1502
 
8.8%
[ 1336
 
7.8%
] 1336
 
7.8%
K 1282
 
7.5%
I 1209
 
7.1%
856
 
5.0%
, 856
 
5.0%
A 417
 
2.4%
R 410
 
2.4%
Other values (32) 3435
20.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 17023
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
' 4384
25.8%
T 1502
 
8.8%
[ 1336
 
7.8%
] 1336
 
7.8%
K 1282
 
7.5%
I 1209
 
7.1%
856
 
5.0%
, 856
 
5.0%
A 417
 
2.4%
R 410
 
2.4%
Other values (32) 3435
20.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 17023
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
' 4384
25.8%
T 1502
 
8.8%
[ 1336
 
7.8%
] 1336
 
7.8%
K 1282
 
7.5%
I 1209
 
7.1%
856
 
5.0%
, 856
 
5.0%
A 417
 
2.4%
R 410
 
2.4%
Other values (32) 3435
20.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 17023
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
' 4384
25.8%
T 1502
 
8.8%
[ 1336
 
7.8%
] 1336
 
7.8%
K 1282
 
7.5%
I 1209
 
7.1%
856
 
5.0%
, 856
 
5.0%
A 417
 
2.4%
R 410
 
2.4%
Other values (32) 3435
20.2%

Interactions

2025-07-30T02:13:18.925820image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:14.633060image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:15.360307image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:16.005543image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:16.859003image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:18.049657image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:19.036515image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:14.772305image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:15.470364image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:16.161412image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:17.028136image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:18.235491image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:19.139408image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:14.878778image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:15.577972image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:16.292708image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:17.412292image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:18.400603image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:19.236276image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:14.999336image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:15.669612image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:16.434886image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:17.552868image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:18.584984image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:19.348794image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:15.117749image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:15.775584image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:16.575562image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:17.723312image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:18.694132image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:19.470595image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:15.235699image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:15.880704image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:16.725204image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:17.895440image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-07-30T02:13:18.808167image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

2025-07-30T02:13:28.165083image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Unnamed: 0age_at_diagnosisgendermetastatic_sitemitotic_rateos_statusprimary_siteracesample_coveragesample_typesourcestage_at_diagnosistreatmenttreatment_responsetumor_gradetumor_puritytumor_size
Unnamed: 01.0000.1780.1430.303-0.1840.4780.2400.227-0.3760.5600.6620.3560.4360.5170.459-0.008-0.161
age_at_diagnosis0.1781.0000.0750.179-0.0260.2200.0860.103-0.1520.1850.1720.0690.1380.1150.1490.071-0.029
gender0.1430.0751.0000.2160.1190.0310.1130.0850.1460.1130.1420.1120.1580.1250.1410.0000.132
metastatic_site0.3030.1790.2161.0000.0000.3910.1890.1530.2390.5121.0000.4540.1740.2351.0000.1900.153
mitotic_rate-0.184-0.0260.1190.0001.0000.2190.0760.0590.0100.0001.0000.3460.0000.1341.0000.1080.240
os_status0.4780.2200.0310.3910.2191.0000.2590.2860.1610.5600.7660.2260.5640.3120.4480.1300.235
primary_site0.2400.0860.1130.1890.0760.2591.0000.2810.0940.3130.3990.2540.2000.1970.2630.0890.063
race0.2270.1030.0850.1530.0590.2860.2811.0000.0000.2930.4590.1390.2070.0910.2500.0540.000
sample_coverage-0.376-0.1520.1460.2390.0100.1610.0940.0001.0000.2271.0000.2090.0930.1431.0000.0720.047
sample_type0.5600.1850.1130.5120.0000.5600.3130.2930.2271.0000.6350.3090.6460.3500.4740.0250.075
source0.6620.1720.1421.0001.0000.7660.3990.4591.0000.6351.0000.5980.8610.5980.5040.0001.000
stage_at_diagnosis0.3560.0690.1120.4540.3460.2260.2540.1390.2090.3090.5981.0000.5310.6260.2870.0000.236
treatment0.4360.1380.1580.1740.0000.5640.2000.2070.0930.6460.8610.5311.0000.4830.4970.0000.000
treatment_response0.5170.1150.1250.2350.1340.3120.1970.0910.1430.3500.5980.6260.4831.0000.0800.0880.112
tumor_grade0.4590.1490.1411.0001.0000.4480.2630.2501.0000.4740.5040.2870.4970.0801.0001.0001.000
tumor_purity-0.0080.0710.0000.1900.1080.1300.0890.0540.0720.0250.0000.0000.0000.0881.0001.000-0.002
tumor_size-0.161-0.0290.1320.1530.2400.2350.0630.0000.0470.0751.0000.2360.0000.1121.000-0.0021.000

Missing values

2025-07-30T02:13:19.668841image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.
2025-07-30T02:13:19.884439image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-07-30T02:13:20.203395image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Unnamed: 0sample_idpatient_idage_at_diagnosisstage_at_diagnosistumor_sizemitotic_ratetreatmenttreatment_responseprimary_sitesample_typeracegendermetastatic_sitetumor_puritysample_coverageos_monthstreatment_startos_statussourcetumor_grademutated_genes
00COSS103018392420969.0UnknownNaNNaNIMATINIBPRStomachPrimaryNaNMaleNaNNaNNaNNaNMISSING_DATENaNCOSMICUnknown['KIT']
11COSS103018492420969.0UnknownNaNNaNIMATINIBPRStomachPrimaryNaNMaleNaNNaNNaNNaNMISSING_DATENaNCOSMICUnknown['KIT']
22COSS103546992936152.0UnknownNaNNaNIMATINIBNRSmall IntestineMetastasisNaNFemaleNaNNaNNaNNaNMISSING_DATENaNCOSMICUnknown['KIT']
33COSS103547092936152.0UnknownNaNNaNIMATINIBPRSmall IntestineMetastasisNaNFemaleNaNNaNNaNNaNMISSING_DATENaNCOSMICUnknown['KIT']
44COSS103601292988457.0UnknownNaNNaNIMATINIBCRSmall IntestineUnknownNaNFemaleNaNNaNNaNNaNMISSING_DATENaNCOSMICUnknown['KIT']
55COSS103601392988545.0UnknownNaNNaNIMATINIBCRStomachUnknownNaNMaleNaNNaNNaNNaNMISSING_DATENaNCOSMICUnknown['KIT']
66COSS103601492988651.0UnknownNaNNaNIMATINIBCRSmall IntestineUnknownNaNFemaleNaNNaNNaNNaNMISSING_DATENaNCOSMICUnknown['KIT']
77COSS104653593974958.0UnknownNaNNaNIMATINIBPRSmall IntestineLocal RecurrenceNaNFemaleNaNNaNNaNNaNMISSING_DATENaNCOSMICUnknown['KIT']
88COSS104653693974958.0UnknownNaNNaNIMATINIBNRSmall IntestineMetastasisNaNFemaleNaNNaNNaNNaNMISSING_DATENaNCOSMICUnknown['KIT']
99COSS1117717100541457.0UnknownNaNNaNIMATINIBCRColon/RectumUnknownNaNMaleNaNNaNNaNNaNMISSING_DATENaNCOSMICUnknown['KIT']
Unnamed: 0sample_idpatient_idage_at_diagnosisstage_at_diagnosistumor_sizemitotic_ratetreatmenttreatment_responseprimary_sitesample_typeracegendermetastatic_sitetumor_puritysample_coverageos_monthstreatment_startos_statussourcetumor_grademutated_genes
38713871NaNeacfb466-572c-43da-8efa-2fa76b54f92472.0metastasisNaNNaNUNKNOWNNaNSmall IntestineMetastasisNaNFemaleNaN70.0NaNNaNMISSING_DATENaNGDCUnknownNaN
38723872NaNef5cdca3-6945-4bcd-9551-ab65ac50839954.0metastasisNaNNaNUNKNOWNNaNSmall IntestineMetastasisNaNFemaleNaN40.0NaNNaNMISSING_DATENaNGDCUnknownNaN
38733873NaNf126e7c0-23a5-4966-9b6e-45bea9b4310a26.0metastasisNaNNaNUNKNOWNNaNSoft TissueMetastasisNaNMaleNaN50.0NaNNaNMISSING_DATENaNGDCUnknownNaN
38743874NaNf29b7559-d664-4ffb-88c0-77a2d2df659b66.0metastasisNaNNaNUNKNOWNNaNSmall IntestineMetastasisNaNMaleNaN80.0NaNNaNMISSING_DATENaNGDCUnknownNaN
38753875NaNf33ce1b9-a7ca-4659-b747-c5bba594958569.0metastasisNaNNaNUNKNOWNNaNSoft TissueMetastasisNaNFemaleNaN60.0NaNNaNMISSING_DATENaNGDCUnknownNaN
38763876NaNfaaa8f08-cfc4-4aec-a350-bfaf7c6458eb58.0metastasisNaNNaNUNKNOWNNaNStomachMetastasisNaNMaleNaN40.0NaNNaNMISSING_DATENaNGDCUnknownNaN
38773877NaNfae9b190-d0d8-42ac-88b7-0c94f301ff2348.0metastasisNaNNaNUNKNOWNNaNSoft TissueMetastasisNaNMaleNaN80.0NaNNaNMISSING_DATENaNGDCUnknownNaN
38783878NaNfbcb378e-b5f2-4c5f-b9a2-c14b5d62019858.0metastasisNaNNaNUNKNOWNNaNStomachMetastasisNaNMaleNaN70.0NaNNaNMISSING_DATENaNGDCUnknownNaN
38793879NaNfd9c738b-ac3a-4a95-aa1c-30b37254f63943.0metastasisNaNNaNUNKNOWNNaNStomachMetastasisNaNMaleNaN60.0NaNNaNMISSING_DATENaNGDCUnknownNaN
38803880NaNffb0514c-62a4-4970-b825-d49a0e57055065.0metastasisNaNNaNUNKNOWNNaNRetroperitoneumMetastasisNaNFemaleNaN60.0NaNNaNMISSING_DATENaNGDCUnknownNaN